协作过滤问题通常是基于矩阵完成技术来解决的,该技术恢复了用户项目交互矩阵的缺失值。在矩阵中,额定位置专门表示给定的用户和额定值。以前的矩阵完成技术倾向于忽略矩阵中每个元素(用户,项目和评分)的位置,但主要关注用户和项目之间的语义相似性,以预测矩阵中缺少的值。本文提出了一种新颖的位置增强的用户/项目表示培训模型,用于推荐,Super-Rec。我们首先使用相对位置评级编码并存储位置增强的额定信息及其用户项目与嵌入的固定尺寸,而不会受矩阵大小影响。然后,我们将受过训练的位置增强用户和项目表示形式应用于最简单的传统机器学习模型,以突出我们表示模型的纯粹新颖性。我们对建议域中的位置增强项目表示形式进行了首次正式介绍和定量分析,并对我们的Super-Rec进行了原则性的讨论,以表现优于典型的协作过滤推荐任务,并具有明确的和隐式反馈。
translated by 谷歌翻译
在线仇恨言语检测已随着数字设备的增长而变得重要,但是英语以外的其他语言资源非常有限。我们介绍了K-MHAS,这是一种新的多标签数据集,用于仇恨言语检测,可有效处理韩国语言模式。该数据集由新闻评论中的109k话语组成,并提供了从1到4个标签的多标签分类,并处理主观性和相交性。我们评估了K-MHAS上强的基线。Kr-Bert带有子字符的代币器优于表现,在每个仇恨言论类中都认识到分解的角色。
translated by 谷歌翻译
基于生成对抗神经网络(GAN)的神经声码器由于其快速推理速度和轻量级网络而被广泛使用,同时产生了高质量的语音波形。由于感知上重要的语音成分主要集中在低频频段中,因此大多数基于GAN的神经声码器进行了多尺度分析,以评估降压化采样的语音波形。这种多尺度分析有助于发电机提高语音清晰度。然而,在初步实验中,我们观察到,重点放在低频频段的多尺度分析会导致意外的伪影,例如,混叠和成像伪像,这些文物降低了合成的语音波形质量。因此,在本文中,我们研究了这些伪影与基于GAN的神经声码器之间的关系,并提出了一个基于GAN的神经声码器,称为Avocodo,该机器人允许合成具有减少伪影的高保真语音。我们介绍了两种歧视者,以各种视角评估波形:协作多波段歧视者和一个子兰歧视器。我们还利用伪正常的镜像滤波器库来获得下采样的多频段波形,同时避免混音。实验结果表明,在语音和唱歌语音合成任务中,鳄梨的表现优于常规的基于GAN的神经声码器,并且可以合成无伪影的语音。尤其是,鳄梨甚至能够复制看不见的扬声器的高质量波形。
translated by 谷歌翻译
使用原始波形作为输入的端到端学习模型在许多音频识别任务中表现出卓越的性能。但是,大多数模型体系结构基于主要用于视觉识别任务的卷积神经网络(CNN)。在本文中,我们提出了挤压和兴奋网络(SENETS)的扩展,该网络(SENETS)通过使用循环模块在下层中从顶层特征添加了时间反馈控制到频道的特征激活。这类似于人类听觉系统中外发中心的自适应增益控制机理。我们将提出的模型应用于语音命令识别,并表明它的表现略优于SENET和其他基于CNN的模型。我们还通过进行故障分析和可视化时间反馈引起的频道特征缩放的范围来研究性能改善的细节。
translated by 谷歌翻译
In robotics and computer vision communities, extensive studies have been widely conducted regarding surveillance tasks, including human detection, tracking, and motion recognition with a camera. Additionally, deep learning algorithms are widely utilized in the aforementioned tasks as in other computer vision tasks. Existing public datasets are insufficient to develop learning-based methods that handle various surveillance for outdoor and extreme situations such as harsh weather and low illuminance conditions. Therefore, we introduce a new large-scale outdoor surveillance dataset named eXtremely large-scale Multi-modAl Sensor dataset (X-MAS) containing more than 500,000 image pairs and the first-person view data annotated by well-trained annotators. Moreover, a single pair contains multi-modal data (e.g. an IR image, an RGB image, a thermal image, a depth image, and a LiDAR scan). This is the first large-scale first-person view outdoor multi-modal dataset focusing on surveillance tasks to the best of our knowledge. We present an overview of the proposed dataset with statistics and present methods of exploiting our dataset with deep learning-based algorithms. The latest information on the dataset and our study are available at https://github.com/lge-robot-navi, and the dataset will be available for download through a server.
translated by 谷歌翻译
The Coronavirus disease 2019 (COVID-19) was first identified in Wuhan, China, in early December 2019 and now becoming a pandemic. When COVID-19 patients undergo radiography examination, radiologists can observe the present of radiographic abnormalities from their chest X-ray (CXR) images. In this study, a deep convolutional neural network (CNN) model was proposed to aid radiologists in diagnosing COVID-19 patients. First, this work conducted a comparative study on the performance of modified VGG-16, ResNet-50 and DenseNet-121 to classify CXR images into normal, COVID-19 and viral pneumonia. Then, the impact of image augmentation on the classification results was evaluated. The publicly available COVID-19 Radiography Database was used throughout this study. After comparison, ResNet-50 achieved the highest accuracy with 95.88%. Next, after training ResNet-50 with rotation, translation, horizontal flip, intensity shift and zoom augmented dataset, the accuracy dropped to 80.95%. Furthermore, an ablation study on the effect of image augmentation on the classification results found that the combinations of rotation and intensity shift augmentation methods obtained an accuracy higher than baseline, which is 96.14%. Finally, ResNet-50 with rotation and intensity shift augmentations performed the best and was proposed as the final classification model in this work. These findings demonstrated that the proposed classification model can provide a promising result for COVID-19 diagnosis.
translated by 谷歌翻译
Feature acquisition algorithms address the problem of acquiring informative features while balancing the costs of acquisition to improve the learning performances of ML models. Previous approaches have focused on calculating the expected utility values of features to determine the acquisition sequences. Other approaches formulated the problem as a Markov Decision Process (MDP) and applied reinforcement learning based algorithms. In comparison to previous approaches, we focus on 1) formulating the feature acquisition problem as a MDP and applying Monte Carlo Tree Search, 2) calculating the intermediary rewards for each acquisition step based on model improvements and acquisition costs and 3) simultaneously optimizing model improvement and acquisition costs with multi-objective Monte Carlo Tree Search. With Proximal Policy Optimization and Deep Q-Network algorithms as benchmark, we show the effectiveness of our proposed approach with experimental study.
translated by 谷歌翻译
Uniform-precision neural network quantization has gained popularity since it simplifies densely packed arithmetic unit for high computing capability. However, it ignores heterogeneous sensitivity to the impact of quantization errors across the layers, resulting in sub-optimal inference accuracy. This work proposes a novel neural architecture search called neural channel expansion that adjusts the network structure to alleviate accuracy degradation from ultra-low uniform-precision quantization. The proposed method selectively expands channels for the quantization sensitive layers while satisfying hardware constraints (e.g., FLOPs, PARAMs). Based on in-depth analysis and experiments, we demonstrate that the proposed method can adapt several popular networks channels to achieve superior 2-bit quantization accuracy on CIFAR10 and ImageNet. In particular, we achieve the best-to-date Top-1/Top-5 accuracy for 2-bit ResNet50 with smaller FLOPs and the parameter size.
translated by 谷歌翻译
This study introduces and examines the potential of an AI system to generate health awareness messages. The topic of folic acid, a vitamin that is critical during pregnancy, served as a test case. Using prompt engineering, we generated messages that could be used to raise awareness and compared them to retweeted human-generated messages via computational and human evaluation methods. The system was easy to use and prolific, and computational analyses revealed that the AI-generated messages were on par with human-generated ones in terms of sentiment, reading ease, and semantic content. Also, the human evaluation study showed that AI-generated messages ranked higher in message quality and clarity. We discuss the theoretical, practical, and ethical implications of these results.
translated by 谷歌翻译
We propose an approach for semantic imitation, which uses demonstrations from a source domain, e.g. human videos, to accelerate reinforcement learning (RL) in a different target domain, e.g. a robotic manipulator in a simulated kitchen. Instead of imitating low-level actions like joint velocities, our approach imitates the sequence of demonstrated semantic skills like "opening the microwave" or "turning on the stove". This allows us to transfer demonstrations across environments (e.g. real-world to simulated kitchen) and agent embodiments (e.g. bimanual human demonstration to robotic arm). We evaluate on three challenging cross-domain learning problems and match the performance of demonstration-accelerated RL approaches that require in-domain demonstrations. In a simulated kitchen environment, our approach learns long-horizon robot manipulation tasks, using less than 3 minutes of human video demonstrations from a real-world kitchen. This enables scaling robot learning via the reuse of demonstrations, e.g. collected as human videos, for learning in any number of target domains.
translated by 谷歌翻译